reading time
Quantifying the Effects of Word Length, Frequency, and Predictability on Dyslexia
Rydel-Johnston, Hugo, Kafkas, Alex
Division of Psychology, Communication & Human Neuroscience, The University of Manchester Author Note Hugo Rydel - Johnston https://orcid.org/0009 - 0006 - 1103 - 1015 Alex Ka fkas https://orcid.org/0000 - 0001 - 5133 - 8827 We have no conflict s of interest to disclose. Correspondence concerning this article should be addressed to Hugo Rydel - Johnston, Division of Psychology, Communication & Human Neuroscience, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK . DYSLEXIC READING TAKES LONGER 2 Abstract We ask where, and under what conditions, dyslexic reading costs arise in a large - scale naturalistic reading dataset. Using eye - tracking aligned to word - level properties -- word length, frequency, and predictability -- we model the influence of each of these feat ures on dyslexic time costs. We find that all three properties robustly change reading times in both typical and dyslexic readers, but dyslexic readers show stronger sensitivities to each of the three features, especially predictability. Counterfactual man ipulations of these features substantially narrow the dyslexic - control gap -- by about one - third -- with predictability showing the strongest effect, followed by length, and frequency. These patterns align with existing dyslexia theories suggesting heightened de mands on linguistic working memory and phonological encoding in dyslexic reading and directly motivate further research into lexical complexity and preview benefits to further explain the quantified gap. In effect, these findings break down when extra dysl exic costs arise, how large they are, and provide actionable guidance for the development of interventions and computational models for dyslexic readers. Keywords: e ye movements, r eading time, w ord length, l exical f requency, p redictability, s kipping, t otal reading time DYSLEXIC READING TAKES LONGER 3 Why Dyslexic Reading Takes Longer - And When Dyslexia is characterized by persistent difficulty in accurate and/or fluent word recognition and decoding (Lyon et al., 2003) and affects between 4 - 8% of individuals (Yang et al., 2022; Doust et al., 2022).
- Europe > United Kingdom (0.24)
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.30)
Modeling Bottom-up Information Quality during Language Processing
Ding, Cui, Yin, Yanning, Jäger, Lena A., Wilcox, Ethan Gotlieb
Contemporary theories model language processing as integrating both top-down expectations and bottom-up inputs. One major prediction of such models is that the quality of the bottom-up inputs modulates ease of processing -- noisy inputs should lead to difficult and effortful comprehension. We test this prediction in the domain of reading. First, we propose an information-theoretic operationalization for the "quality" of bottom-up information as the mutual information (MI) between visual information and word identity. We formalize this prediction in a mathematical model of reading as a Bayesian update. Second, we test our operationalization by comparing participants' reading times in conditions where words' information quality has been reduced, either by occluding their top or bottom half, with full words. We collect data in English and Chinese. We then use multimodal language models to estimate the mutual information between visual inputs and words. We use these data to estimate the specific effect of reduced information quality on reading times. Finally, we compare how information is distributed across visual forms. In English and Chinese, the upper half contains more information about word identity than the lower half. However, the asymmetry is more pronounced in English, a pattern which is reflected in the reading times.
- Europe > Austria > Vienna (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > Singapore (0.04)
- (4 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
To model human linguistic prediction, make LLMs less superhuman
When people listen to or read a sentence, they actively make predictions about upcoming words: words that are less predictable are generally read more slowly than predictable ones. The success of large language models (LLMs), which, like humans, make predictions about upcoming words, has motivated exploring the use of these models as cognitive models of human linguistic prediction. Surprisingly, in the last few years, as language models have become better at predicting the next word, their ability to predict human reading behavior has declined. This is because LLMs are able to predict upcoming words much better than people can, leading them to predict lower processing difficulty in reading than observed in human experiments; in other words, mainstream LLMs are'superhuman' as models of language comprehension. In this position paper, we argue that LLMs' superhumanness is primarily driven by two factors: compared to humans, LLMs have much stronger long-term memory for facts and training examples, and they have much better short-term memory for previous words in the text. W e advocate for creating models that have human-like long-term and short-term memory, and outline some possible directions for achieving this goal. Finally, we argue that currently available human data is insufficient to measure progress towards this goal, and outline human experiments that can address this gap.
- North America > United States > Maryland > Baltimore (0.04)
- Europe > United Kingdom > Scotland > Stirling > Stirling (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior
Re, Francesco Ignazio, Opedal, Andreas, Manaiev, Glib, Giulianelli, Mario, Cotterell, Ryan
Reading is a process that unfolds across space and time, alternating between fixations where a reader focuses on a specific point in space, and saccades where a reader rapidly shifts their focus to a new point. An ansatz of psycholinguistics is that modeling a reader's fixations and saccades yields insight into their online sentence processing. However, standard approaches to such modeling rely on aggregated eye-tracking measurements and models that impose strong assumptions, ignoring much of the spatio-temporal dynamics that occur during reading. In this paper, we propose a more general probabilistic model of reading behavior, based on a marked spatio-temporal point process, that captures not only how long fixations last, but also where they land in space and when they take place in time. The saccades are modeled using a Hawkes process, which captures how each fixation excites the probability of a new fixation occurring near it in time and space. The duration time of fixation events is modeled as a function of fixation-specific predictors convolved across time, thus capturing spillover effects. Empirically, our Hawkes process model exhibits a better fit to human saccades than baselines. With respect to fixation durations, we observe that incorporating contextual surprisal as a predictor results in only a marginal improvement in the model's predictive accuracy. This finding suggests that surprisal theory struggles to explain fine-grained eye movements.
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly
Lin, Yi-Chien, Schuler, William
As Transformers become more widely incorporated into natural language processing tasks, there has been considerable interest in using surprisal from these models as predictors of human sentence processing difficulty. Recent work has observed a positive relationship between Transformer-based models' perplexity and the predictive power of their surprisal estimates on reading times, showing that language models with more parameters and trained on more data are less predictive of human reading times. However, these studies focus on predicting latency-based measures (i.e., self-paced reading times and eye-gaze durations) with surprisal estimates from Transformer-based language models. This trend has not been tested on brain imaging data. This study therefore evaluates the predictive power of surprisal estimates from 17 pre-trained Transformer-based models across three different language families on two functional magnetic resonance imaging datasets. Results show that the positive relationship between model perplexity and model fit still obtains, suggesting that this trend is not specific to latency-based measures and can be generalized to neural measures.
- Europe > Austria > Vienna (0.14)
- North America > United States > Ohio (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (5 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.91)
- Health & Medicine > Therapeutic Area (0.68)
Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled
Lin, Yi-Chien, Zhu, Hongao, Schuler, William
The impressive linguistic abilities of large language models (LLMs) have recommended them as models of human sentence processing, with some conjecturing a positive 'quality-power' relationship (Wilcox et al., 2023), in which language models' (LMs') fit to psychometric data continues to improve as their ability to predict words in context increases. This is important because it suggests that elements of LLM architecture, such as veridical attention to context and a unique objective of predicting upcoming words, reflect the architecture of the human sentence processing faculty, and that any inadequacies in predicting human reading time and brain imaging data may be attributed to insufficient model complexity, which recedes as larger models become available. Recent studies (Oh and Schuler, 2023) have shown this scaling inverts after a point, as LMs become excessively large and accurate, when word prediction probability (as information-theoretic surprisal) is used as a predictor. Other studies propose the use of entire vectors from differently sized LLMs, still showing positive scaling (Schrimpf et al., 2021), casting doubt on the value of surprisal as a predictor, but do not control for the larger number of predictors in vectors from larger LMs. This study evaluates LLM scaling using entire LLM vectors, while controlling for the larger number of predictors in vectors from larger LLMs. Results show that inverse scaling obtains, suggesting that inadequacies in predicting human reading time and brain imaging data may be due to substantial misalignment between LLMs and human sentence processing, which worsens as larger models are used.
- North America > United States > Ohio (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- (2 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.69)
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Biological neural networks are shaped both by evolution across generations and by individual learning within an organism's lifetime, whereas standard artificial neural networks undergo a single, large training procedure without inherited constraints. In this preliminary work, we propose a framework that incorporates this crucial generational dimension--an "outer loop" of evolution that shapes the "inner loop" of learning--so that artificial networks better mirror the effects of evolution and individual learning in biological organisms. Focusing on language, we train a model that inherits a "model connectome" from the outer evolution loop before exposing it to a developmental-scale corpus of 100M tokens. Compared with two closely matched control models, we show that the connectome model performs better or on par on natural language processing tasks as well as alignment to human behavior and brain data. These findings suggest that a model connec-tome serves as an efficient prior for learning in low-data regimes - narrowing the gap between single-generation artificial models and biologically evolved neural networks. How does the brain quickly and robustly learn to perform a wide array of tasks?
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > Middle East > Jordan (0.04)
Language Models Grow Less Humanlike beyond Phase Transition
Aoyama, Tatsuya, Wilcox, Ethan
LMs' alignment with human reading behavior (i.e. psychometric predictive power; PPP) is known to improve during pretraining up to a tipping point, beyond which it either plateaus or degrades. Various factors, such as word frequency, recency bias in attention, and context size, have been theorized to affect PPP, yet there is no current account that explains why such a tipping point exists, and how it interacts with LMs' pretraining dynamics more generally. We hypothesize that the underlying factor is a pretraining phase transition, characterized by the rapid emergence of specialized attention heads. We conduct a series of correlational and causal experiments to show that such a phase transition is responsible for the tipping point in PPP. We then show that, rather than producing attention patterns that contribute to the degradation in PPP, phase transitions alter the subsequent learning dynamics of the model, such that further training keeps damaging PPP.
- Asia > Thailand (0.14)
- North America > United States > Utah (0.14)
- Europe > Middle East > Malta (0.14)
- (3 more...)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
If Attention Serves as a Cognitive Model of Human Memory Retrieval, What is the Plausible Memory Representation?
Yoshida, Ryo, Isono, Shinnosuke, Kajikawa, Kohei, Someya, Taiga, Sugimito, Yushi, Oseki, Yohei
Recent work in computational psycholinguistics has revealed intriguing parallels between attention mechanisms and human memory retrieval, focusing primarily on Transformer architectures that operate on token-level representations. However, computational psycholinguistic research has also established that syntactic structures provide compelling explanations for human sentence processing that word-level factors alone cannot fully account for. In this study, we investigate whether the attention mechanism of Transformer Grammar (TG), which uniquely operates on syntactic structures as representational units, can serve as a cognitive model of human memory retrieval, using Normalized Attention Entropy (NAE) as a linking hypothesis between model behavior and human processing difficulty. Our experiments demonstrate that TG's attention achieves superior predictive power for self-paced reading times compared to vanilla Transformer's, with further analyses revealing independent contributions from both models. These findings suggest that human sentence processing involves dual memory representations -- one based on syntactic structures and another on token sequences -- with attention serving as the general retrieval algorithm, while highlighting the importance of incorporating syntactic structures as representational units.
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (19 more...)
Large Language Models Are Human-Like Internally
Kuribayashi, Tatsuki, Oseki, Yohei, Taieb, Souhaib Ben, Inui, Kentaro, Baldwin, Timothy
Recent cognitive modeling studies have reported that larger language models (LMs) exhibit a poorer fit to human reading behavior, leading to claims of their cognitive implausibility. In this paper, we revisit this argument through the lens of mechanistic interpretability and argue that prior conclusions were skewed by an exclusive focus on the final layers of LMs. Our analysis reveals that next-word probabilities derived from internal layers of larger LMs align with human sentence processing data as well as, or better than, those from smaller LMs. This alignment holds consistently across behavioral (self-paced reading times, gaze durations, MAZE task processing times) and neurophysiological (N400 brain potentials) measures, challenging earlier mixed results and suggesting that the cognitive plausibility of larger LMs has been underestimated. Furthermore, we first identify an intriguing relationship between LM layers and human measures: earlier layers correspond more closely with fast gaze durations, while later layers better align with relatively slower signals such as N400 potentials and MAZE processing times. Our work opens new avenues for interdisciplinary research at the intersection of mechanistic interpretability and cognitive modeling.
- North America > United States (0.28)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Tōhoku (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)